智能论文笔记

VulCurator: A Vulnerability-Fixing Commit Detector

Truong Giang Nguyen , Thanh Le-Cong , Hong Jin Kang , Xuan-Bach D. Le , David Lo

分类：人工智能

2022-09-07

如今，随着发现的OSS漏洞的数量，开源软件（OSS）漏洞管理流程随着时间的流逝而增加。监视漏洞固定提交是防止脆弱性开发的标准过程的一部分。但是，由于可能有大量的审查，手动检测漏洞固定的犯罪是耗时的。最近，已经提出了许多技术来自动检测使用机器学习的漏洞固定提交。这些解决方案要么：（1）不使用深度学习，或（2）仅对有限的信息来源使用深度学习。本文提出了藤条，该工具利用了更丰富的信息来源，包括提交消息，代码更改和针对漏洞固定的提交分类的报告。我们的实验结果表明，在F1得分方面，沃尔维尔剂的表现优于最先进的基线。 Vulcurator工具可在https://github.com/ntgiang71096/vfdetector和https://zenodo.org/record/7034132#.yw3mn-xbzdi上公开获得。

translated by 谷歌翻译

AutoPruner: Transformer-Based Call Graph Pruning

Thanh Le-Cong , Hong Jin Kang , Truong Giang Nguyen , Stefanus Agus Haryono , David Lo , Xuan-Bach D. Le , Huynh Quyet Thang

分类：人工智能

2022-09-07

构建静态呼叫图需要在健全和精度之间进行权衡。不幸的是，用于构建呼叫图的程序分析技术通常不精确。为了解决这个问题，研究人员最近提出了通过机器学习为静态分析构建的后处理呼叫图所授权的呼叫图。机器学习模型的构建是为了通过在随机森林分类器中提取结构特征来捕获呼叫图中的信息。然后，它消除了预测为误报的边缘。尽管机器学习模型显示了改进，但它们仍然受到限制，因为它们不考虑源代码语义，因此通常无法有效地区分真实和误报。在本文中，我们提出了一种新颖的呼叫图修剪技术AutoRoprouner，用于通过统计语义和结构分析消除呼叫图中的假阳性。给定一个由传统静态分析工具构建的呼叫图，AutoProuner采用基于变压器的方法来捕获呼叫者与呼叫图中每个边缘相关的呼叫者和Callee函数之间的语义关系。为此，AutoProuner微型调节模型是在大型语料库上预先训练的代码模型，以根据其语义的描述表示源代码。接下来，该模型用于从与呼叫图中的每个边缘相关的功能中提取语义特征。 AutoProuner使用这些语义功能以及从呼叫图提取的结构特征通过馈送前向神经网络分类。我们在现实世界程序的基准数据集上进行的经验评估表明，AutoProuner的表现优于最先进的基线，从而改善了F量级，在识别静态呼叫图中识别错误阳性边缘方面，高达13％。

translated by 谷歌翻译

Visual correspondence-based explanations improve AI robustness and human-AI team accuracy

Giang Nguyen , Mohammad Reza Taesiri , Anh Nguyen

分类：计算机视觉 | 人工智能 | 机器学习

2022-07-26

在许多高风险应用中，人工智能（AI）的预测越来越重要，甚至是必要的，而人类是最终的决策者。在这项工作中，我们提出了两种自我解剖图像分类器的新型架构，这些架构首先解释，然后通过利用查询图像和示例之间的视觉对应关系来预测（与事后解释）。我们的模型始终在分布（OOD）数据集上始终改进（提高1-4分），同时在分布测试中略差（比Resnet-50）和$ k $ near的邻居分类器更差（1至2分）。（KNN）。通过大规模的人类对成像网和幼崽的研究，我们基于对应的解释对用户的解释比KNN解释更有用。我们的解释可帮助用户更准确地拒绝AI的错误决策，而不是所有其他测试方法。有趣的是，我们首次表明，在ImageNet和Cub图像分类任务中，有可能实现互补的人类团队的准确性（即比Ai-Olone或单词更高）。

translated by 谷歌翻译

Manas: Mining Software Repositories to Assist AutoML

Giang Nguyen , Johir Islam , Rangeet Pan , Hridesh Rajan

分类：机器学习

2021-12-06

今天深入学习广泛用于构建软件。深度学习的软件工程问题是找到一个适当的卷积神经网络（CNN）模型，为开发人员可能是一个挑战。最近的自动化工作，更精确的神经结构搜索（NAS），由自动KERAS等工具体现，旨在通过基本上将其视为起始点是默认CNN模型的搜索问题来解决这个问题，以及该CNN模型的突变允许探索CNN模型的空间以找到最适合问题的CNN模型。这些作品在生产高精度CNN模型方面取得了重大成功。然而，有两个问题。首先，NAS可以非常昂贵，通常需要几个小时才能完成。其次，NAS生产的CNN模型可能非常复杂，使得更容易理解它们和肋骨训练它们。我们提出了一种对NAS的新方法，而不是从默认的CNN模型开始，初始模型是从GitHub提取的模型的存储库中选择的。与默认模型相比，直觉是解决类似问题的开发人员可能已经开发出更好的起点。我们还在野外分析了CNN模型的常见层模式，以了解开发人员改善其模型的变化。我们的方法在NAS中使用通常发生的变化变化。我们已经扩展了自动KERAS来实现我们的方法。我们的评估使用8个顶级投票问题来自滑动的拍卖，包括图像分类和图像回归显示，给出了相同的搜索时间，而不会损失准确性，MANAS产生的模型，比Auto-Keras的型号更少为42.9％至99.6％。在GPU上基准测试，Manas的模型训练比汽车keras的型号快30.3％至641.6％。

translated by 谷歌翻译

The effectiveness of feature attribution methods and its correlation with automatic evaluation scores

Giang Nguyen , Daeyoung Kim , Anh Nguyen

分类：计算机视觉 | 人工智能

2021-05-31

在许多现实世界中的高级应用程序中，解释人工智能（AI）模型的决策（AI）模型越来越重要。数以百计的论文提出了新功能归因方法，在其工作中讨论或利用这些工具。然而，尽管人类是目标最终用户，但大多数归因方法仅在代理自动评估指标上进行评估（Zhang等人，2018年； Zhou等人，2016年； Petsiuk等人，2018年）。在本文中，我们进行了首个用户研究，以衡量归因地图的有效性，以帮助人类进行成像网分类和斯坦福犬细粒分类，以及图像是自然或对抗性的（即包含对抗性扰动）。总体而言，特征归因比显示最近的训练集示例的人更有效。在一项艰巨的狗分类的艰巨任务中，向人类提供归因地图无济于事，而是与仅AI相比会损害人类团队的性能。重要的是，我们发现自动归因地图评估措施与实际人类AI团队的绩效较差。我们的发现鼓励社区严格测试其在下游人类应用应用程序上的方法，并重新考虑现有的评估指标。

translated by 谷歌翻译

Explaining How Deep Neural Networks Forget by Deep Visualization

Giang Nguyen , Shuan Chen , Tae Joon Jun , Daeyoung Kim

分类：机器学习 | 计算机视觉

2020-05-03

解释通常被认为是黑匣子的深神经网络的行为，尤其是当它们在人类生活的各个方面被采用时。借助可解释的机器学习的优势（可解释的ML），本文提出了一种名为灾难性遗忘的解剖器（或CFD）的新颖工具，以解释在持续学习环境中的灾难性遗忘。我们还根据我们的工具的观测值介绍了一种称为关键冻结的新方法。关于重新系统的实验表达了如何发生灾难性遗忘，尤其是表明该著名网络的哪些组成部分正在忘记。我们的新持续学习算法通过大量余量击败了各种最近的技术，证明了调查的能力。批判性冻结不仅攻击灾难性的遗忘，而且揭示了解释性。

translated by 谷歌翻译

Machine Learning Approach to Polymerization Reaction Engineering: Determining Monomers Reactivity Ratios

Tung Nguyen , Mona Bavarian

分类：机器学习

2023-01-03

Here, we demonstrate how machine learning enables the prediction of comonomers reactivity ratios based on the molecular structure of monomers. We combined multi-task learning, multi-inputs, and Graph Attention Network to build a model capable of predicting reactivity ratios based on the monomers chemical structures.

translated by 谷歌翻译

Neural Collapse in Deep Linear Network: From Balanced to Imbalanced Data

Hien Dang , Tan Nguyen , Tho Tran , Hung Tran , Nhat Ho

分类：机器学习 | (统计)机器学习

2023-01-01

Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.

translated by 谷歌翻译

Integrating Semantic Information into Sketchy Reading Module of Retro-Reader for Vietnamese Machine Reading Comprehension

Hang Thi-Thu Le , Viet-Duc Ho , Duc-Vu Nguyen , Ngan Luu-Thuy Nguyen

分类：自然语言处理

2023-01-01

Machine Reading Comprehension has become one of the most advanced and popular research topics in the fields of Natural Language Processing in recent years. The classification of answerability questions is a relatively significant sub-task in machine reading comprehension; however, there haven't been many studies. Retro-Reader is one of the studies that has solved this problem effectively. However, the encoders of most traditional machine reading comprehension models in general and Retro-Reader, in particular, have not been able to exploit the contextual semantic information of the context completely. Inspired by SemBERT, we use semantic role labels from the SRL task to add semantics to pre-trained language models such as mBERT, XLM-R, PhoBERT. This experiment was conducted to compare the influence of semantics on the classification of answerability for the Vietnamese machine reading comprehension. Additionally, we hope this experiment will enhance the encoder for the Retro-Reader model's Sketchy Reading Module. The improved Retro-Reader model's encoder with semantics was first applied to the Vietnamese Machine Reading Comprehension task and obtained positive results.

translated by 谷歌翻译

Leveraging Semantic Representations Combined with Contextual Word Representations for Recognizing Textual Entailment in Vietnamese

Quoc-Loc Duong , Duc-Vu Nguyen , Ngan Luu-Thuy Nguyen

分类：自然语言处理

2023-01-01

RTE is a significant problem and is a reasonably active research community. The proposed research works on the approach to this problem are pretty diverse with many different directions. For Vietnamese, the RTE problem is moderately new, but this problem plays a vital role in natural language understanding systems. Currently, methods to solve this problem based on contextual word representation learning models have given outstanding results. However, Vietnamese is a semantically rich language. Therefore, in this paper, we want to present an experiment combining semantic word representation through the SRL task with context representation of BERT relative models for the RTE problem. The experimental results give conclusions about the influence and role of semantic representation on Vietnamese in understanding natural language. The experimental results show that the semantic-aware contextual representation model has about 1% higher performance than the model that does not incorporate semantic representation. In addition, the effects on the data domain in Vietnamese are also higher than those in English. This result also shows the positive influence of SRL on RTE problem in Vietnamese.

translated by 谷歌翻译